09. Data Normalization

Lesson 4 -- Data Normalization

Pre-processing done on FIFA dataset

Here is a link to the Google spreadsheet with the solution dataset

Create Player Ranking by Value

To create a ranking for the players by value, there were several pre-processing steps that needed to be completed.

  1. The numerical value needed to be extracted or parsed out from the Value column. This required some Excel parsing using INDEX and ROW functions in columns titled character and digit.
  2. Next the value needed to be normalized to ensure that values in thousands (K) and millions (M) were standardized. See how the Excel IF function was used to in combination with the character column to get the normalized value.
  3. From there on, the player was ranked based on the new normalized value data in the normalized column.

Create Normalized Height

  1. Similarly, the get the normalized height, various Excel functions were used to obtain a normalized height column titled normalized2.

Create Normalized Length of Time since Joining

  1. To create the length of time since the player had joined, the Joined column was used to create a joined normalized column by using Excel's TODAY function.

Additional Note

To create outlier thresholds, you can use the following formulae: IQR*1.5 + Q3 value